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Abstract — The design of the channel part of a digital communi- 
cation system (e.g., error correction, modulation) Is heavily based 
on the assumption that the data to be transmitted forms a fair bit 
stream. However, simple source encoders such as short Huffman 
codes generate bit streams that poorly match this assumption. 
As a result, the channel Input distribution does not match the 
original design criteria. In this work, a simple method called half 
Huffman coding (halfHc) Is developed. halfHc transforms a 
Huffman code Into a source code whose output Is more similar to 
a fair bit stream. This Is achieved by permuting the codewords 
such that the frequency of Is at the output Is close to 0.5. The 
permutations are such that the optlmallty In terms of achieved 
compression ratio is preserved. halfHc is applied In a practical 
example, and the resulting overall system performs better than 
when conventional Huffman coding is used. 

I. Introduction 

The key part of a digital communication system is the 
binary interface between source coding and channel coding 
n\ p. 1-3]. The ultimate objective of source coding is to 
transform the data into a sequence of Os and Is that is 
indistinguishable from a fair bit stream, i.e., a sequence of 
independent and identically distributed (iid) equiprobable bits. 
Accordingly, most work on channel coding starts with the 
assumption that the data to be transmitted comes as a fair 
bit stream. 

In |2|, we have shown that the capacity achieving input 
probability mass function (pmf) of a channel can be approxi- 
mated arbitrarily well by parsing a fair bit stream by a matcher 
code and by doing so, capacity achieving modulation can 
be established |3|. However, this result is heavily based on 
the assumption that the binary interface is a fair bit stream. 
Asymptotically in terms of source encoder complexity, the 
binary interface can indeed be turned into a fair bit stream 
Q. In practice, however, we are often restricted to simple 
encoders, and the resulting binary output differs significantly 
from a fair bit stream. As a simple measure to evaluate 
how close the generated bit stream is to a fair bit stream, 
the frequency of generated Is can be considered. For a fair 
bit stream, this frequency should be 0.5. One possibility to 
achieve this goal is to pass the source encoder output through 
a scrambler |5|. The drawback of this approach is that the 
corresponding descrambler has to be known and implemented 
at the receiver ||6). 



In this work we propose an algorithm called half Huffman 
coding (halfHc) that constructs prefix-free source codes. 
halfHc achieves the optimal compression ratio (i.e., the same 
as Huffman coding (He)), and the frequency of Is of HALFHc 
is closer to 0.5 than the frequency of Is of He. As in the case 
of conventional Hc, no additional descrambler at the receiver 
is required. We apply HALFHc to the shaping problem in 171. 
We show that the resulting channel input pmf is closer to 
the one predicted by the fair bit stream assumption than the 
channel input pmf resulting from conventional He. A complete 
implementation of HALFHe in Matlab can be found at our 
website |8|. 

The remainder of this paper is organized as follows. In 
Section In] we state the main problem. The idea of our solution 
is formulated in Section |lll] We then formulate in Section ITVl 
our idea as a mathematical optimization problem, propose 
methods for solving it, and formulate our new algorithm 
HALFHe. Finally, in Section |V] we apply HALFHe to a 
practical example. 

II. Problem Statement 
A. Motivating example 

In fl), we consider a channel with three input symbols 
r, 1, m with the associated costs 

w = (0.18, 0.18, 0.31)^, (1) 

where (O^ is the transpose. The channel input pmf p is subject 
to the cost constraint 



w^p <S = 0.2063. 



(2) 



The optimal channel input probability mass function (pmf) 
was calculated as 



p* = (0.3988, 0.3988, 0.2023)' 



(3) 



Note that p* fulfills the cost constraint with equality. The class 
of pmfs that can be generated at the channel input by parsing 
a fair bit stream by a matcher code are the dyadic pmfs 12). 
A pmf is dyadic if each entry pi can be written in the form 



p, = 2-^ £ e N 



(4) 
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Fig. 1. We first compress the text to a binary sequence by He and then 
match the binary sequence to the design criteria by a matcher code. At the 
output of the matcher, a sequence of the symbols r , 1 , m is generated. 



code can be stated as 

minimize D(d||p*) 

subject to w d < S (5) 

d is a dyadic pmf, 

is the Kullback-Leibler (KL) distance, which is 



where D(-| 
defined as 



D(p||g) = Vpilog — 



(6) 



with log denoting the natural logarithm. In p], the algorithm 
cost constrained geometric Huffman coding (CCGhc) is de- 
veloped, and it is shown that for a fair bit stream at the 
binary interface, the resulting matcher code optimally solves 
Problem (|5]l. The solution d achieves 



D(d||p*) = 0.0048392 
w^d = 0.20607 < S. 



(7) 
(8) 



However, the bit stream at the binary interface is generated 
by compressing the English text [91 by a simple He, and 
this bit stream is then parsed by the matcher code, see 
Fig. [T] By He we refer to Huffman coding as implemented 
by huf fmandict .m in the Matlab Communications System 
Toolbox. The effective pmf, the effective KL-distance, and 
the effective cost at the output of the matcher are respectively 
given by 

due = (0.40663, 0.35761, 0.23576)^ (9) 

D(rfHcl|p*) = 0.0050066 (10) 

w'^ dnc = 0.21065 > S (11) 

i.e., what is actually observed significantly differs from what 
we would expect under the fair bit stream assumption. 

B. Approach 

While observations (|9]l-([TT]i may occur since the effectively 
observed values are a realization of a random process, which 
can always diverge from the expected values, the reason can 
also be that the assumption of a fair bit stream is false. To 
evaluate the quality of the binary interface, we calculate the 
effective relative number of Is. For a fair bit stream, this 
number should be close to q — 0.5. However, the effective 
frequency that we observe after applying He to the English 
text Q is 



^Hc = 0.45821. 



(12) 



A maximum likelihood estimator (MLE) of q yields for the 
bit stream at the output of He the 95% confidence interval 
(0.4464, 0.47006). Since the corresponding value of q = 0.5 



TABLE I 
He FOR THE English text (9) 

i Pi Ci l{i) 





1 


0.1699 


000 


3 


1 


e 


2 


0.0964 


110 


3 


1 


t 


3 


0.0777 


0010 


4 


2 


a 


4 


0.0717 


0100 


4 


2 


i 


5 


0.0663 


0101 


4 


2 





6 


0.0645 


0111 


4 


2 


n 


7 


0.0614 


1000 


4 


2 


r 


8 


0.0530 


1010 


4 


2 


s 


9 


0.0500 


1110 


4 


2 


h 


10 


0.0373 


00110 


5 


3 


c 


11 


0.0325 


01101 


5 


3 


1 


12 


0.0325 


01100 


5 


3 


m 


13 


0.0277 


10110 


5 


3 


u 


14 


0.0277 


10011 


5 


3 


d 


15 


0.0235 


11110 


5 


3 


f 


16 


0.0199 


mil 


5 


3 


g 


17 


0.0181 


001110 


6 


4 


y 


18 


0.0145 


100100 


6 


4 


p 


19 


0.0139 


100101 


6 


4 


b 


20 


0.0133 


101110 


6 


4 


w 


21 


0.0114 


101111 


6 


4 


V 


22 


0.0054 


00111100 


8 


5 


k 


23 


0.0042 


00111101 


8 


5 


X 


24 


0.0024 


001111100 


9 


6 


q 


25 


0.0018 


001111110 


9 


6 


z 


26 


0.0018 


001111101 


9 


6 


J 


27 


0.0012 


001111111 


9 


6 



is not contained in this interval, the MLE rejects the hypothesis 
of having a fair bit stream after applying He. Our approach 
is therefore to look for optimal prefix-free source codes that 
yield a q close to 0.5. 

ni. Main Idea 



As stated in 1 10 Sec. 5.8], optimal prefix-free source codes 



are not unique and for a given pmf. He constructs one optimal 
code. All other optimal codes can be obtained by applying 
appropriate permutations of the code generated by Hc. In 
this section we will show how the frequency of Is can be 
influenced by specific permutations. 

A. Frequency of Is 

Denote the number of symbols of the considered source 
by n. Without loss of generality, we assume that the pmf p 
describing the source is ordered with decreasing probabilities, 

i.e. 

Pi > P2 > ■ ■ ■ > Pn- (13) 

Denote by C the ordered set of codewords generated by Hc, 

i.e.. 



C = (ci, . . . ,c„). 



(14) 



Denote by 1 C; the number of Is in the ith codeword. We 
write the length of codeword Ci as l{i). Denote by m the 
number of distinct codeword lengths and order and index the 
set of distinct codeword lengths {£j}"^i- For the English text 
[91 we have i,pi,Ci,l{i) and j displayed in Table ll] Denote 



the pmf of the codeword lengths by r, i.e., 

j = l,...,m. 



i:l(i)=tj 



(15) 



Denote by p^^j the probability that symbol i is generated given 
that the codeword length is tij, i.e., if Pi|j > 0, then 



P^\j 



(16) 



Denote by Nj the expected number of Is conditioned on 
codeword length Ij, i.e.. 



N^ 



1 ~ / . 



Pi\jl^C^- 



(17) 



Thus the expected frequency (7 of Is given by the expected 
number of Is A^ per average length L can be written as 



q = 



N 
1 






E E Piu^jij 

] = li:l(i)=f.j 



(18) 



(19) 



B. Permutations of C 

Any permutation of codewords of the same length again 
gives an optimal code. However, while the achieved compres- 
sion rate is invariant under these permutation, the mean num- 
ber of Is is not. This is because in general, there are codewords 
of the same length that occur with different probabilities. A 
naive approach consists of searching through aU permutations, 
and choose the one with the mean number of Is being 
closest to 0, 5. However, the number of such permutations 
becomes very fast prohibitively large. For instance, for the 
code displayed in Table 11] this number is approximately 



2!7!7!5!2!4! « 3 • 10^ 



(20) 



Consider now the set of codewords Cj of codewords of length 
(.j. We consider two special permutations of Cj. First, the one 
that maximizes the expected number of Is and second, the one 
that minimizes the expected number of Is. The first is obtained 
by ordering the codewords in Cj with decreasing number of 
Is, and the second one is obtained by ordering the codewords 
in Cj with increasing number of Is. Denote the corresponding 
permutations by tt^ and tt^, respectively. For example, for the 
code in Table IT] the corresponding permutations for codeword 
length Hi =3 are 

TT^ -.1^2, 2 i-^ 1 
TTf : 1 H^ 1, 2i-^ 2. 

The maximum expected number N^ of Is in a codeword 
conditioned on the codeword length tj is given by 



Nt 



Y. P'lji^^-; 



(*) 



and the minimum expected number of Is N- is given by 



^7 = 



E 



P^b-l^c^ 



'{i)- 



(24) 



For example, in case of codeword length 3 (corresponding to 
^1), we can see from Table [I] that 

Pi r, , P2 



Nr = 



•0: 



iv; 



Pl +P2 
Pi 







Pl +P2 
P2 



1.276 
0.724 



Pl + P2 Pl + P2 

The idea is now to choose for each codeword length £j 
between N^ and N^ , with the objective to get the overall 
expected number of Is divided by the expected codeword 
length as close to g = 0.5 as possible. This means, we would 
like to solve 



q 



E rjN;^ 



L 



0.5 



(25) 



where each Zj takes values in {+,—}. The corresponding 
optimization problem is discussed in the next section. 

IV. Half Huffman Coding 
We will express goal (p5]l in terms of a mathematical 



optimization problem. For each index j corresponding to a 
specific codeword length ij, we can either take the maximiz- 
ing permutation n^ or the minimizing permutation ttJ . We 
introduce a binary vector x e {0, 1}'", which serves as a 
selection variable between both permutations. By using ( |25] l, 
the expected frequency of Is can be expressed in terms of Xj 
as 



E rjN;^ 



L 






n;) 



n;] 



(26) 
(27) 



For Xj = 0, we choose N^ 



and for Xj — 1, we choose iV 



We measure the quality of our selection x by the absolute 
deviation between q and 0.5. Hence, in vector notation, the 
objective we want to minimize has the form 



k-o.5| = 



— [r o (n_ — n_|_)] x 



1 
L' 



T 



0.5 



(28) 



where o denotes the elementwise Hadamard vector product, 
and n and n+ are defined as 



i:l{i 



(21) 


n_ = (7Vr,...,7V-)^ 


(29) 


(22) 
'ord. 


n+ = (7V+,...,iV+)^. 
For notational convenience, we further substitute 


(30) 




a= -[ro (n_ - n+)] 


(31) 


(23) 


b= \nlr-Q.b. 

Ij 


(32) 



We can now state the optimization problem 

minimize |a-^a; + 6| 

subject to Xj e {0, 1}, j — 1,. . . , m. 
Introducing the epigraph variable t S K+ pTj, we can directly 



Algorithm 1. 



(33) 



see that the problem (33i is equivalent to 



minimize t 

x,t 

subject to —t < a^x 



X 



e{0,i}, 



-b<t 



(34) 



, m. 



This problem is a mixed integer linear program (MILP) in 
its canonical form | |12[ , and is generally hard to solve. Since 
we are focusing on short-length Huffman codes, the number 
of different codeword lengths m will not be too large. As 
discussed below, we can use standard methods for solving this 
problem globally. 

A. Optimization methods 

1) Naive exhaustive search: In order to find the global 



solution to problem ( 33 1, we can simply try all possible vectors 
X e {0, 1}™. In our example from Table ll] m = 6, that 
is, we have to choose between 2^ = 64 possible vectors x. 
Thus, in our example we have overcome the huge number 
of possible permutations (|20l), but in general, we still might 



be constrained by the combinatorial nature of problem ( 34 1, 
since the complexity of exhaustive search grows exponentially 
in the number of distinct codeword lengths m. This problem 
can be overcome by considering smarter search algorithms, as 
discussed next. 

2) Combinatorial feasibility method via bisection: We can 



also exploit some structure in the MILP formulation (34i by 
using a bisection method |11 p. 146]. Suppose we set the 



epigraph variable to a fixed value t. We can now try to find 
a feasible solution to the remaining combinatorial feasibility 
problem 



find X 

subject to — 



t < a^x 



b<t 



(35) 



e{0,l}, J = l, 



. ,771. 



There are two possible cases that can occur: 

1) If we find a feasible solution, the particular choice of 
t is greater or equal than the smallest possible value. 
Hence the value of t can be further decreased. 

2) When there is no feasible solution, the choice of t was 
too small and we have to increase it. 

After checking both cases we can solve the feasibility problem 
with an updated version of t, and repeat until convergence. 
This approach is summarized in Algorithm 1. 

3) Specific branch and bound method: Formulation P3[ ) 
falls into the class of problems discussed in |13, Sec. 2]. 



Thus, if none of the methods proposed in Subsection IV-Al 
and Subsection |IV-A2| can solve the problem in acceptable 
time, a specific branch and bound solver for problem ( [33] ) can 
easily be implemented and still finds the optimal solution in 
hopefully reasonable time. 



set I , u, tolerance e > 
repeat 

1. t :== {l + u)/2 

2. Solve the combinatorial feasibility problem ( [35] l 

3. if ( [35] l is feasible 

decrease u :— t; 
else 

increase I :— t; 
until u — I < e 
return x 

B. Half Huffman coding 

We can now state HALFHc, see Algorithm |2] for a sum- 
mary. In detail, we are given a pmf with entries sorted in 
descending order First, we calculate the conventional Huffman 
code C — Hc(p). Then, for each codeword length tj, we 
determine the maximum and minimum permutations tt^ and 
TiJ , respectively. We use these permutations to calculate the 
vectors n+ , n of maximum and minimum expected numbers 
of Is. For the resulting vectors, we solve Problem ( (33] l by any 
method from IV.A in order to find an optimal selection vector 
X. The selection vector now determines which permutation has 
to be applied for each codeword length ij, i.e.. 




if Xj — 
if Xj — 1 



i = 1, . 



, m. 



(36) 



Finally, the resulting permutation tt — (tti, . . . , vTm) is applied 
to get the final code, i.e., HALFHc(p) — 7r(C). A complete 
implementation of HALFHc in Matlab can be found at our 
website |8|. 

Algorithm 2. (halfHc) 



Pi 


> • 


■■>Pn 






1. 


C = 


Hc(p) 






2. 


find 


n+,n_ 


via ([23]), ^ 




3. 


X = 


solution of (|33|) via any 


method from IV.A 


4 


. TV = 


= (tti,.. 


.,7r,„) according 


to^ 


return 


7t{C) 







V. Numerical Results 

We apply HALFHc to the EngUsh text ||9|. We exe- 
cute HALFHc twice. First, we use in step 3 exhaustive 
search | IV-Al Second, we use the combinatorial feasibility 
method IIV-A2I Both methods find the same selection vector 
X, which is given by 



X = (1, 0, 0, 0, 0, 0)"^ 



(37) 



The generated code is displayed in Table |ll] As can be seen, 
for £i = 3, the codewords are sorted with decreasing number 
of Is, while for remaining codeword lengths, the codewords 
with increasing number of Is. Notice the differences to the 



TABLE II 
halfHc for the English text (9) 





1 


0.1699 


000 


3 


1 


e 


2 


0.0964 


110 


3 


1 


t 


3 


0.0777 


1110 


4 


2 


a 


4 


0.0717 


0111 


4 


2 


i 


5 


0.0663 


1010 


4 


2 


o 


6 


0.0645 


0101 


4 


2 


n 


7 


0.0614 


1000 


4 


2 


r 


8 


0.0530 


0100 


4 


2 


s 


9 


0.0500 


0010 


4 


2 


h 


10 


0.0373 


mil 


5 


3 


c 


11 


0.0325 


10011 


5 


3 


1 


12 


0.0325 


11110 


5 


3 


m 


13 


0.0277 


01101 


5 


3 


u 


14 


0.0277 


10110 


5 


3 


d 


15 


0.0235 


01100 


5 


3 


f 


16 


0.0199 


00110 


5 


3 


g 


17 


0.0181 


101111 


6 


4 


y 


18 


0.0145 


101110 


6 


4 


p 


19 


0.0139 


100101 


6 


4 


b 


20 


0.0133 


001110 


6 


4 


w 


21 


0.0114 


100100 


6 


4 


V 


22 


0.0054 


00111101 


8 


5 


k 


23 


0.0042 


00111100 


8 


5 


X 


24 


0.0024 


001111111 


9 


6 


q 


25 


0.0018 


001111110 


9 


6 


z 


26 


0.0018 


001111101 


9 


6 


J 


27 


0.0012 


001111100 


9 


6 



code obtained by conventional Hc as displayed in Table 11] 
The resulting effective frequency of Is of HALFHc is 



^HALFHC = 0.49985. 



(38) 



This is much closer to 0.5 than the value 0.45821 that resulted 
from conventional He. Thus, HALFHc achieved the first 
objective given in (|25|, namely to get the frequency of Is 
closer to 0.5. 

Let's consider now if this has the desired effect on the 
effective distributions that are generated by a matcher code. 
For the English text f9], the resulting effective pmf dpff is 



^HALFHC = (0.38627, 0.41107, 0.20266)^ 



(39) 



The resulting KL-distance and average cost are respectively 
given by 



D(dHALFHc|lp*) = 0.00048629 
10^^ = 0.20635. 



(40) 
(41) 



Compared to Hc, the KL-distance is reduced. Thus, by using 
HALFHc instead of conventional Hc, the effective output 
of a matcher code is closer to the output expected under 
the fair bit stream assumption. The effective cost of He 
exceeds the cost constraint by 2.11%, HALFHc exceeds the 
cost constraint S = 0.2063 by only 0.02%. Although both Hc 
and HALFHc formally violate the cost constraint, the value 
achieved by HALFHc was adopted as a practical solution by 
the collaborating architects in |7|, who originally formulated 
the cost constraint S. We can conclude that our approach of 
minimizing \q — 0.5| leads to the desired result. 
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achieved by halfHc, gaALPHc = 0.49985 
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Fig. 2. Comparison between He and HALFHc for the English text f9l. The 
horizontal axis corresponds to the average cost iv^p while the vertical axis 
corresponds to the KL-distance D(p ||p*). For He, p = dnc, see J9}, and 
for HALFHc, p = dnALpHc, see J39l. The blue line marks the average cost 
constraint S = 0.2063 of the original design problem jsj. 
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